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Abstract  This  paper  is  intended  not  as  a  survey,  but  as  an  introduction  to  some 
ideas  behind  the  class  of  mesh  adaptive  direct  search  (MADS)  methods.  Space 
limitations  dictate  a  brief  description  of  various  key  topics  to  be  provided  along 
with  several  references,  which  themselves  provide  further  references. 

The  convergence  theory  for  the  methods  presented  here  make  a  case  for  clos¬ 
ing  the  gap  between  nonlinear  optimizers  and  nonsmooth  analysts.  However  these 
methods  are  certainly  not  of  purely  theoretical  interest;  they  are  successful  on  dif¬ 
ficult  practical  problems.  To  encourage  further  use,  we  give  references  to  avail¬ 
able  implementations.  MADS  is  implemented  in  the  direct  search  portion  of  the 
Math  Works  MATLAB  Genetic  Algorithm  and  Direct  Search  (GADS)  Toolbox. 

Keywords  :  Mesh  adaptive  direct  search  algorithm,  filter  algorithm,  barrier 
approach,  constrained  optimization,  nonlinear  programming. 


1  Introduction  -  the  problem  and  its  properties 

For  us,  derivative-free  optimization  excludes  methods  that  use  standard  finite  dif¬ 
ference  approximations  to  derivatives  in  a  Newton  or  SQP  algorithmic  framework. 
Those  are  well  established  and  valuable  methods.  Indeed,  there  are  many  reasons 
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to  use  them  in  place  of  really  derivative-free  methods  like  the  ones  treated  here 
if  one  can.  However,  our  target  class  of  problems  are  not  amenable  to  such  an 
approach. 

Since  this  is  to  be  only  one  of  several  papers  devoted  to  derivative-free  opti¬ 
mization,  we  will  concentrate  on  summarizing  our  work  without  making  an  at¬ 
tempt  to  give  a  survey  of  the  topic.  This  is  a  relief  because  derivative-free  opti¬ 
mization  is  already  a  diverse  area  of  optimization,  and  it  is  growing  fast,  due  in 
part  to  the  importance  of  these  algorithms  for  applications. 

Our  interest  in  the  topic  of  direct  search  methods  came  directly  from  users, 
and  our  interaction  with  users  continues  to  be  our  strongest  influence.  However, 
it  would  be  incorrect  to  assume  that  these  methods  are  not  interesting  for  their 
own  sake.  We  have  found  all  the  theoretical  challenge  we  would  wish  for  in  this 
area.  This  leads  to  another  point  we  will  try  to  make:  derivative-free  optimization 
will  and  should  form  closer  ties  between  computational  optimization  and  nons¬ 
mooth  analysis.  We  believe  that  nonsmooth  analysis  should  be  included  in  any 
curriculum  meant  to  train  nonlinear  optimization  researchers. 

In  this  paper,  we  consider  the  general  nonlinear  optimization  problem: 

min  /(*)  (!) 

where  £2  =  {x  G  X  :  cj(x)  <  0,j  G  /}  C  M"  and  f,cj  :  X  — >  M"  U  {°°}  for  all 
j  EJ  =  { 1,2,...,  m},  and  where  X  is  a  subset  of  K".  No  differentiability  assump¬ 
tions  on  the  objective  and  constraints  are  required  for  these  algorithms.  However, 
the  strength  of  the  optimality  results  at  a  limit  point  are  closely  tied  to  the  local 
smoothness  of  the  functions  there  and  to  properties  of  the  tangent  cone  to  £1  at  a 
limit  point  produced  by  the  algorithm. 

We  treat  X  and  C(x)  —  (c\  (x)-C2(x).  ■  ■  ■  ,cm(x))T  <  0  differently  because  they 
are  intended  to  model  different  classes  of  constraints.  The  set  X  includes  the  set 
of  points  to  which  x  must  belong  in  order  that  the  functions  fix)  and  C(x)  can  be 
evaluated.  We  only  require  that  the  user  provides  a  routine  that  says  whether  or 
not  x  is  in  X.  We  refer  to  these  constraints  as  “yes/no”  or  oracular  constraints. 

Another  interesting  aspect  of  these  problems  is  that  even  when  x  e  fl,  we  may 
not  get  a  value  for  f(x)  or  C(jc),  though  it  may  take  as  long  to  find  that  out  as  it 
would  have  if  we  had  been  able  to  get  a  value.  We  model  this  situation  by  setting 
f(x)  =  +oo.  This  happens,  for  example,  in  some  multi-disciplinary  optimization 
(MDO)  problems,  in  which  getting  a  value  depends  on  runtime  linking  of  legacy 
PDE  solvers  IfTTl . 

Every  trial  point,  as  well  as  the  initial  point,  must  satisfy  the  X  constraints,  but 
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C(x)<0  is  only  required  to  hold  at  the  solution.  In  fact,  the  user  is  often  interested 
in  how  much  optimality  is  possible  with  a  slight  relaxation  of  these  constraints. 
We  call  these  open  constraints,  and  we  treat  them  by  a  modification  given  in  lfl4il 
of  the  filter  method. 

Filter  algorithms  were  introduced  by  Fletcher  and  Leyffer  Oil  as  a  way  to 
globalize  sequential  linear  and  quadratic  programming  (SLP  and  SQP)  without 
using  any  merit  function  for  weighting  the  relative  merits  of  improving  feasibil¬ 
ity  and  optimality.  A  filter  algorithm  introduces  a  function  that  aggregates  con¬ 
straint  violations  and  then  treats  the  resulting  biobjective  problem.  A  trial  point 
is  accepted  if  it  either  reduces  the  value  of  the  objective  function  or  that  of  the 
constraint  violation;  otherwise  it  is  said  to  be  filtered.  Although  this  clearly  is 
less  parameter-dependent  than  a  penalty  function,  specifying  a  constraint  viola¬ 
tion  function  still  implies  an  assignment  of  relative  weights  to  reducing  the  viola¬ 
tion  of  each  constraint.  The  algorithm  maintains  feasibility  with  respect  to  X  by 
modifying  the  aggregate  constraint  violation  for  Q.  to  be  +°o  outside  of  X. 

A  key  feature  of  the  optimization  problems  we  most  often  meet  in  practice  is 
that  they  involve  running  an  expensive  simulation  to  get  ancillary  variables  needed 
to  evaluate  the  blackbox  function  codes  that  define  /  and  C.  This  means  that  we 
need  to  be  parsimonious  with  function  and  constraint  values,  and  it  also  implies 
that  there  are  likely  to  be  few  correct  digits  in  the  output.  As  a  result,  deriva¬ 
tives  are  unlikely  to  exist  everywhere,  and  if  they  do  exist,  difference  quotients 
are  not  likely  to  give  derivative  approximations  suited  to  use  in  derivative-based 
algorithms. 

Often  in  practice,  users  express  a  desire  to  obtain  the  “global  optimizer”  of 
f(x)  on  F>.  As  we  have  described  the  problems,  this  is  not  something  any  algo¬ 
rithm  can  guarantee  in  practice.  Still,  global  optimization  algorithms  generally 
find  useful  solutions  when  they  can  be  applied  to  real  problems.  Indeed,  with 
some  attention  to  globality,  the  algorithms  we  outline  here  give  equally  useful 
solutions.  We  believe  that  this  is  because  of  synergy  between  this  user  request 
and  another  important  user  desire  -  robustness.  In  this  context,  one  can  think  of  a 
robust  optimizer  as  one  occurring  in  a  broad  valley.  Such“global  optimizers”  are 
rather  easier  to  find  than  those  belonging  to  a  narrow,  but  deeper  basin. 
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2  What  are  mesh  adaptive  direct  search  (MADS) 
methods? 

The  methods  we  consider  here  are  direct  search  methods.  As  the  name  mesh  adap¬ 
tive  direct  search  (MADS)  implies,  these  methods  generate  iterates  on  a  tower  of 
underlying  meshes  on  the  domain  space.  However,  also  as  the  name  implies,  they 
perform  an  adaptive  search  on  the  meshes  including  controlling  the  refinement 
of  the  meshes.  The  reader  interested  in  the  rather  technical  details  should  read 
mao.  Here  we  ask  the  reader  to  imagine  an  underlying  mesh  and  an  algorithm 
for  generating  trial  points  on  the  mesh  and  adapting  the  fineness  of  the  mesh  to 
approach  a  local  optimizer.  We  stress  that  the  full  mesh  is  never  explicitly  gener¬ 
ated. 

It  is  possible  to  dispense  with  the  mesh  as  in  [43,  44],  which  seems  a  simpli¬ 
fication  on  the  face  of  it.  The  argument  against  doing  away  with  the  mesh  is  that 
one  must  then  use  a  sufficient  decrease  condition  rather  than  accepting  any  point 
that  provides  simple  decrease.  Sufficient  decrease  conditions  in  the  derivative- 
free  situation  are  not  as  simple  as  a  backtracking  Goldstein- Armijo  strategy  in  the 
quasi-Newton  case  [l30ll .  Our  suspicion  is  that  whether  or  not  to  use  the  mesh 
is  a  matter  of  taste,  not  of  algorithmic  effectiveness,  though  we  have  no  actual 
experience  without  the  mesh  on  real  problems. 

Above,  we  mentioned  the  utility  of  nonsmooth  analysis  in  derivative-free  op¬ 
timization.  MADS  is  a  case  in  point.  We  discovered  MADS  as  a  direct  result  of 
weaknesses  in  the  generalized  pattern  search  (GPS)  class  of  algorithms  Wl\.  when 
applied  to  nonsmooth  problems,  which  were  exposed  when  we  used  nonsmooth 
analysis  to  analyze  GPS  []T2l  14] . 

We  could  also  call  the  methods  considered  SEARCH  -  POLL  methods  because 
each  iteration  consists  of  two  steps,  SEARCH  and  POLL  .  The  goal  of  an  iteration 
is  to  find  unfiltered  points  in  A.  If  SEARCH  fails  to  find  an  unfiltered  point,  then 
POLL  is  executed,  and  if  POLL  does  not  succeed,  then  the  mesh  is  refined. 

The  SEARCH  step  is  crucial  in  practice  because  it  is  so  flexible,  but  it  is  a 
difficulty  for  the  theory  for  the  same  reason.  SEARCH  can  return  any  point  on  the 
underlying  mesh,  but  of  course,  it  is  trying  to  identify  an  unfiltered  point.  Any 
aspirations  to  find  a  local  minimizer  in  a  deeper  basin  than  the  one  we  start  in  is 
concentrated  in  the  SEARCH  step.  When  we  discuss  some  SEARCH  strategies,  we 
will  justify  this  point. 

The  POLL  step  is  more  rigidly  defined,  though  there  is  still  some  flexibility 
in  how  this  is  implemented.  Since  the  POLL  step  is  the  basis  of  the  convergence 
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analysis,  it  is  the  part  of  the  algorithm  where  most  research  has  been  concentrated. 

Lewis  and  Torczon  ll39l  recognized  that  POLL  should  consider  points  on  the 
mesh  neighboring  the  incumbent  solution  in  a  set  of  directions  whose  nonnegative 
linear  combinations  span  the  space.  This  may  seem  simple,  but  it  is  a  crucial 
observation.  Coope  and  Price  |21,  23  123]  extended  this  notion  to  the  idea  of 
frames,  which  can  be  thought  of  as  doing  away  with  the  requirement  that  the 
POLL  points  be  mesh  neighbors.  Audet  and  Dennis  [|T3l  suggested  MADS  as 
a  way  to  implement  frames  so  that  the  directions  used  in  infinitely  many  POLL 
steps  generate  a  dense  set  in  the  tangent  cone  at  a  MADS  limit  point  x  e  X.  This 
allows  strong  convergence  results  llT3l  [5]  and  excellent  computational  results  for 
the  MADS  algorithms  [[15]  [T6l  41]  - 

2.1  Some  search  strategies 

The  SEARCH  step  can  be  empty.  By  this  we  mean  that  the  algorithm  can  be  im¬ 
plemented  as  a  sequence  of  POLL  steps  only.  This  is  a  reasonable  choice  when  a 
local  minimizer  in  the  same  basin  as  the  initial  guess  is  sufficient.  Another  rea¬ 
sonable  strategy  is  to  try  a  step  in  the  same  direction  as  a  previously  successful 
POLL  step.  It  must  be  said  that  although  this  seems  reasonable,  we  understand 
that  some  researchers  have  found  this  approach  of  limited  value  at  best. 

We  have  experimented  with  random  search  as  a  SEARCH  strategy.  This  has 
some  success  on  the  initial  iteration,  but  it  seems  to  be  a  waste  of  function  values 
after  that. 

In  our  experience,  the  best  SEARCH  strategies  involve  the  use  of  surrogates 
for  /  and  C.  We  use  surrogate  to  mean  an  inexpensive  function  that  the  user 
can  employ  to  look  extensively  on  the  current  mesh  for  points  that  the  surrogate 
predicts  will  improve  the  current  incumbent  solution.  Surrogates  generally  are  of 
two  types,  simplified  physics  simulations  and  surfaces  fit  to  a  set  of  points  in  X 
usually  chosen  by  some  space  filling  design.  We  use  the  term  surrogate  rather 
than  approximation  because  we  do  not  want  to  imply  that  anything  is  required 
with  respect  to  how  well  the  surrogates  approximate  the  problem  functions  [|T8l . 

Boeing  uses  DACE  surrogates  Ii46l  in  their  Design  Explorer  filter  implemen¬ 
tation  [flOll.  They  generate  data  sites  by  an  orthogonal  array,  and  then  fit  a  DACE 
model  to  the  data.  The  SEARCH  consists  of  a  global  Newton  SQP  method  applied 
to  the  surrogate  problem  to  try  to  generate  several  good  local  optimizers  for  that 
problem.  Then  they  use  the  expensive  “true”  problem  functions  at  those  points 
to  decide  whether  the  SEARCH  has  been  successful.  Whenever  new  values  of  the 
true  problem  functions  have  been  computed,  they  are  used  to  recalibrate  the  sur- 
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rogates.  This  surrogate  management  framework  leads  to  very  successful  methods. 
Details  are  given  in  ifTOl . 

Alison  Marsden  has  solved  trailing  edge  shape  design  problems  using  both 
types  of  surrogates  in  an  insightful  way.  She  generates  trial  points  using  the  MAT- 
LAB  DACE  surrogate  package  [|40ll  and  then  uses  a  less  expensive  turbulence 
model  to  check  whether  a  trial  point  is  in  A.  If  it  is,  then  she  runs  the  more  ac¬ 
curate  simulation.  Her  SEARCH  consists  of  applying  an  evolutionary  algorithm  to 
the  DACE  surrogates.  See  iHTft  for  details. 

Another  interesting  application  of  surrogates  is  in  lTT5ll.  where  a  framework  to 
identify  good  algorithmic  parameter  values  is  given.  To  illustrate  this  framework, 
MADS  was  applied  to  an  objective  function  that  measured  the  CPU  time  required 
by  a  trust-region  algorithm  ll32l  to  solve  a  set  of  difficult  problems.  A  natural 
surrogate  function  was  constructed  by  having  the  trust-region  method  to  solve  a 
different  list  of  easy  problems. 

2.2  The  poll  step 

The  POLL  step  is  more  rigidly  defined  than  the  SEARCH  step.  It  is  necessarily 
called  when  the  SEARCH  fails  to  produce  an  unfiltered  point.  The  POLL  step 
consists  of  a  local  exploration  around  the  current  incumbent  solution.  The  trial 
points  are  generated  in  some  directions  scaled  by  a  mesh  size  parameter.  When 
either  the  SEARCH  or  the  POLL  step  is  successful,  then  the  mesh  size  parameter  is 
either  kept  constant  or  increased.  Otherwise,  when  both  steps  fail  to  generate  an 
unfiltered  point,  the  incumbent  is  declared  to  be  a  mesh  local  optimizer  itTHl  and 
the  mesh  size  parameter  is  decreased. 

In  GPS,  the  POLL  directions  were  restricted  to  belong  to  some  finite  set.  The 
GPS  convergence  results  lfT2ll4ll  are  closely  tied  to  these  fixed  directions.  Further¬ 
more,  there  are  some  known  examples  ®  for  which  GPS  falls  short  of  converging 
to  a  satisfactory  solution  because  of  this  restriction. 

MADS  overcomes  this  limitation  by  allowing  a  larger  set  of  POLL  directions. 
In  fact,  as  k  (the  iteration  counter)  goes  to  infinity,  the  union  of  the  normalized 
POLL  directions  over  all  k  becomes  dense  in  the  unit  sphere.  This  algorithmic 
construction  allows  stronger  convergence  results  [H3ll. 

In  some  cases,  incomplete  derivative  information  may  be  available.  For  exam¬ 
ple,  in  some  MDO  problems,  derivatives  for  some  disciplines  may  be  available, 
but  not  for  others,  and  derivatives  across  multiple  disciplines  are  not  available.  If 
the  full  gradient  is  available,  directions  can  be  chosen  so  that  all  but  one  are  ascent 
directions,  which  can  be  ignored,  thus  reducing  the  required  number  of  function 
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evaluations  to  one  per  iteration  .  In  this  case,  MADS  reduces  to  an  approxima¬ 
tion  of  steepest  descent.  Even  if  only  some  partial  derivatives  are  known,  MADS 
can  exploit  this  information  to  reduce  the  number  of  function  evaluations  in  each 
POLL  step  [[6j  without  sacrificing  theoretical  convergence  properties.  In  related 
work,  Custodio  and  Vicente  f[26ll  compute  a  simplex  gradient  from  a  subset  of 
previously  evaluated  points  having  certain  geometrical  properties,  and  they  have 
studied  its  use  as  a  potential  direction  of  descent  in  an  effort  to  speed  convergence. 

Since  MADS  is  opportunistic,  in  that  it  moves  immediately  to  a  new  improved 
mesh  point  as  soon  as  it  is  found,  the  order  in  which  POLL  points  are  evaluated 
can  impact  performance.  One  approach  in  which  we  have  witnessed  such  im¬ 
provement  is  what  we  call  dynamic  polling,  in  which  the  most  recent  successful 
direction  is  moved  to  the  front  of  the  queue  after  each  successful  POLL  step.  Dy¬ 
namic  polling  was  shown  useful  in  [TT3ll  on  a  chemical  engineering  parameter  fit 
problem  [f33ll.  If  we  were  to  use  a  surrogate  in  the  SEARCH  step,  then  evaluating 
the  surrogate  at  each  POLL  point  and  then  ordering  them  by  surrogate  function 
value  would  also  be  a  wise  choice.  Custodio  and  Vicente  [f26l  have  also  seen  a 
reduction  in  function  evaluations  by  computing  a  simplex  gradient  and  ordering 
POLL  points  according  to  how  small  an  angle  the  corresponding  poll  directions 
make  with  the  negative  of  the  simplex  gradient.  One  must  keep  in  mind,  however, 
that  these  strategies  (dynamic  polling,  surrogate  and  simplex  gradient  ordering) 
do  not  necessarily  lead  to  improved  computational  times  in  all  cases. 


3  Why  study  these  methods 

In  previous  sections,  we  have  mentioned  some  applications  of  MADS.  In  this 
section,  we  will  make  some  general  remarks  about  applications,  but  since  the 
interested  reader  can  find  all  the  details  we  can  furnish  in  the  referenced  papers, 
we  save  space  here. 

Also  in  this  section  we  will  discuss  the  theoretical  support  for  MADS.  We  hope 
that  other  derivative-free  optimization  researchers  will  consider  using  nonsmooth 
analysis  to  analyze  their  methods.  The  discovery  of  MADS  was  a  direct  result  of 
our  nonsmooth  analysis  of  GPS,  and  that  has  made  us  enthusiastic  about  building 
a  bridge  to  this  advanced  theoretical  part  of  our  discipline. 
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3.1  Importance  in  practice 

It  is  likely  that  every  paper  in  these  special  issues  will  make  a  case  for  the  practical 
importance  of  derivative-free  optimization  methods.  We  second  everything  the 
other  authors  say,  but  we  will  use  our  space  here  to  try  to  make  a  couple  of  points 
that  other  authors  may  not  make. 

The  first  point  is  that,  in  spite  of  the  formidable  aspects  of  our  target  class 
of  problems,  we  are  often  able  to  solve  them  quite  efficiently.  The  main  reason 
we  were  the  first  to  solve  them  is  that  there  are  barriers  to  applying  traditional 
derivative-based  methods,  and  heuristic  searches  use  too  many  function  evalua¬ 
tions  for  these  relatively  expensive  problem  functions. 

These  problems  typically  take  minutes  to  weeks  for  each  function  value,  and 
many  of  them  have  what  Tim  Kelley  llT9l.  who  also  sees  such  problems,  likes 
to  call  “hidden  constraints”.  This  second  point  means  that  one  calls  the  simu¬ 
lation  codes  that  must  be  run  to  evaluate  the  functions  for  perfectly  innocuous 
arguments,  and  they  fail.  Furthermore,  they  fail  after  running  for  about  the  same 
length  of  time  as  when  they  succeed.  In  [|T8l.  this  sort  of  failure  happens  to  us 
roughly  twice  in  every  three  function  calls. 

The  main  reason  we  have  seen  for  these  evaluation  failures  is  that  the  function 
evaluations  depend  on  runtime  linking  of  single  discipline  solvers;  e.g.,  separate 
structures  and  fluids  codes.  This  is  characteristic  of  multidisciplinary  design  opti¬ 
mization  (MDO)  problems  L2S[8l.  The  interested  reader  will  find  a  vast  amount 
of  MDO  literature  on  the  web. 

Thus,  to  get  the  ancillary  variables  needed  to  evaluate  the  objective  and  con¬ 
straints,  one  must  do  a  multidisciplinary  analysis,  meaning  the  runtime  linking  of 
the  codes.  In  our  experience,  an  MDA  can  be  thought  of  as  solving  a  large  system 
of  nonlinear  equations  for  which  no  Jacobian  information  is  practical.  In  such 
a  circumstance,  there  is  little  one  can  try  except  simple  successive  substitution 
or  nonlinear  Gauss-Seidel.  This  is  sensitive  to  the  order  in  which  the  blocks  are 
processed,  and  it  is  apt  to  fail. 

Another  difficulty  inherent  to  some  of  the  target  problems  is  that  the  functions 
are  often  contaminated  with  noise.  It  is  not  infrequent  that  evaluating  a  function 
twice  at  the  same  value  of  x  returns  slightly  different  values. 

3.2  Theoretical  support 

These  algorithms  are  intended  to  be  applied  to  nonsmooth  problems,  or  to  any 
problems  for  which  derivatives  are  impractical,  even  by  finite  differences.  Typi- 
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cally,  both  the  objective  function  and  the  constraints  are  evaluated  by  running  a 
black  box  computer  code.  There  is  no  way  one  can  measure  the  smoothness  of 
these  functions. 

The  convergence  results  state  that  if  the  MADS  algorithm  is  applied  to  such 
problems,  then  some  optimality  conditions  are  guaranteed.  In  Ifl2l  and  |fT3l  we 
give  a  hierarchy  of  convergence  results  based  on  various  degrees  of  smoothness 
of  the  objective  and  constraints. 

At  the  bottom  of  the  hierarchy,  we  have  a  result  that  if  the  iterates  produced 
by  the  algorithm  are  bounded,  then  there  is  an  x,  which  is  the  limit  of  mesh  local 
optimizers  on  meshes  that  get  infinitely  fine.  Assuming  a  bounded  sequence  of 
iterates  is  a  standard  assumption  in  nonlinear  optimization,  and  it  holds  for  our 
algorithms  if  the  initial  level  set  is  bounded. 

Then,  by  adding  more  smoothness,  the  local  optimality  results  become  suc¬ 
cessively  stronger  for  a  limit  point  x.  At  the  smoothest  end  of  the  hierarchy,  we 
have  that  if  /  is  strictly  differentiable  near  x,  and  if  the  constraint  qualification 
that  the  tangent  cone  Tq(x)  to  the  feasible  region  Q  at  x  G  Q  is  non-empty  and 
full-dimensional,  then  the  directional  derivative  f'  satisfies 

f'(x;d)  >0  for  every  d  G  7h(i). 

This  is  the  KKT  first-order  optimality  condition:  there  are  no  feasible  strict  de¬ 
scent  directions.  In  the  unconstrained  case,  the  tangent  cone  is  the  entire  space, 
and  this  last  condition  becomes  V/(jc)  =  0. 

The  intermediate  results  of  the  convergence  analysis  are  based  on  different 
degrees  of  smoothness.  The  directional  derivatives  f  are  not  appropriate  to  deal 
with  non-smooth  functions,  as  they  are  undefined.  We  turned  to  the  nonsmooth 
community  and  found  exactly  the  analytical  tool  that  we  needed  to  analyze  the 
convergence  of  our  methods:  the  Clarke  Calculus  |[20l. 

Clarke  proposes  a  generalization  f°(x;d )  of  the  directional  derivative  for  lo¬ 
cally  Lipschitz  functions,  and  generalizations  @31  20:.  j35j  of  the  tangent  cone; 
namely,  the  hypertangent  cone  T^(x),  the  Clarke  tangent  cone  (i) ,  and  the 
contingent  cone  T^°{x).  Armed  with  these  definitions,  we  can  show  that  depend¬ 
ing  on  the  smoothness,  the  limit  point  x  generated  by  MADS  satisfies 

f°(x:d)  >  0  for  every  d  G  To  U),  Tq  (i) 

or  in  Tg°{x). 

Furthermore,  in  (51,  we  discover  that  with  more  smoothness  (namely,  that  /  is 
twice  strictly  differentiable  near  x),  x  satisfies  a  second-order  Clarke-KKT  neces- 
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sary  condition  for  optimality  that  depends  on  a  generalization  of  the  Hessian  ma¬ 
trix  f[34ll.  In  fact,  with  additional  assumptions,  x  satisfies  a  second-order  Clarke- 
KKT  sufficient  condition  for  optimality,  thus  ensuring  convergence  of  MADS  to 
a  local  minimizer  [O . 

In  stating  these  results,  we  make  the  assumption  that  the  set  of  directions  used 
infinitely  often  is  dense  in  the  hypertangent  cone  at  x.  As  stated  earlier,  MADS 
is  designed  specifically  so  that  this  can  be  accomplished,  but  in  order  to  do  it  in 
practice,  our  selection  of  positive  spanning  directions  is  done  randomly.  Conse¬ 
quently,  most  of  our  convergence  results  are  with  probability  one. 


4  What  is  still  needed 

There  are  practical  issues  we  still  need  to  deal  with  for  the  class  of  problems 
discussed  above.  Anyone  who  has  worked  with  users  has  had  the  experience  of 
being  told  that  the  problem  has  a  certain  property,  e.g.,  ten  design  variables,  only 
to  be  told  after  solving  the  problem  that  it  would  be  nice  to  be  able  to  deal  with 
one  hundred  design  variables.  This  is  a  sure  sign  of  progress  in  the  project.  In  this 
section,  we  will  give  brief  descriptions  of  some  of  the  main  issues  raised  by  users 
after  an  initial  success  with  the  first  formulation. 

4.1  Categorical  variables 

Nonlinear  mixed  integer  problems  are  hard  enough,  but  many  engineering  design 
problems  involve  categorical  variables.  These  are  discrete  variables  constrained 
to  a  discrete  set  as  a  part  of  X.  The  problem  functions  cannot  be  evaluated  unless 
all  categorical  variables  take  on  feasible  discrete  values.  For  example,  simulating 
an  oil  field  with  25.3  oil  wells  is  out  of  the  question  unless  one  interpolates  and 
thereby  increases  the  number  of  expensive  simulations  required. 

We  use  the  term  mixed  variable  programming  (MVP)  to  denote  mathematical 
programming  problems  with  both  continuous  and  categorical  variables  IfTTil.  An 
example  is  found  in  the  design  of  a  fixed-length  thermal  insulation  system  ll36ll  in 
which  the  objective  is  to  minimize  the  power  required  subject  to  some  reasonable 
linear  constraints. 

In  this  problem,  the  system  consists  of  a  series  of  insulators  of  various  material 
types  and  thicknesses,  each  pair  of  which  is  separated  by  a  metal  plate,  called  a 
heat  intercept ,  to  which  power  is  applied  to  maintain  it  at  a  specified  temperature. 
The  material  thicknesses  and  intercept  temperatures  are  the  continuous  variables, 
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while  the  number  and  types  of  insulators  are  categorical.  In  fact,  the  insulator 
types  are  not  even  numeric,  although  each  material  type  can  be  mapped  to  the 
numeric  value  of  its  index  into  a  list  of  seven  possible  material  types  that  may 
be  selected.  A  further  interesting  complication  is  that  the  number  of  insulators, 
which  defines  the  problem  dimension,  is  itself  a  design  variable.  This  problem 
was  solved  numerically  in  ||36l  using  the  algorithm  introduced  in  [fTTll.  Realistic 
nonlinear  constraints  on  system  mass,  tensile  yield  stress,  and  thermal  contraction 
were  added  to  the  problem  in  |f3|,  and  the  resulting  problem  was  solved  numeri¬ 
cally  using  a  pattern  search  filter  method  0. 

Because  of  the  general  lack  of  ordinality  with  categorical  variables,  MVP 
problems  present  some  unique  challenges.  For  example,  there  is  no  general  notion 
of  local  optimality.  To  overcome  this  challenge,  the  user  must  provide  a  set- valued 
neighborhood  function  that  defines  the  set  of  discrete  neighbors  at  every  point. 
Local  optimality  is  then  defined  with  respect  to  this  function  at  the  limit  point.  In 
the  example  above,  given  a  design  of  the  system,  discrete  neighbors  were  formed 
in  3  ways:  swapping  a  single  insulator  for  another  of  a  different  material,  adding 
an  insulator  and  heat  intercept  at  any  location  (and  adjusting  the  continuous  vari¬ 
ables  appropriately),  or  deleting  any  insulator  with  its  adjacent  intercept.  Once 
the  algorithm  is  appropriately  modified,  we  guarantee  that  the  resulting  solution 
could  not  be  improved  by  moving  to  a  neighboring  point,  as  defined  by  these  three 
classes  of  neighbors. 

The  main  modification  to  the  algorithm  consists  of  augmenting  the  POLL  step 
to  include  points  in  the  set  of  discrete  neighbors,  along  with  other  promising 
points  lITTll.  Convergence  properties  of  GPS  for  MVP  problems  with  a  smooth 
objective  function  and  bound  constraints  on  the  continuous  variables  were  estab¬ 
lished  in  urn  and  extended  to  general  linear  constraints  and  nonsmooth  functions 
in  0.  Convergence  results  for  the  GPS  filter  method  for  MVP  problems  with 
nonlinear  constraints  was  introduced  in  f[2l  [7J- 

The  class  of  MVP  problems  is  actually  quite  common  in  practice,  even  though 
the  field  is  very  new,  and  there  are  some  important  algorithmic  and  structural 
considerations  that  merit  further  research. 

4.2  Multiple  objectives 

It  is  almost  always  true  that  real  optimization  problems  have  multiple  objectives. 
They  may  not  appear  in  this  form,  but  scratch  the  surface  and  they  will.  For 
example,  a  client  might  suggest  a  bound  constraint  on  some  function  y(jc).  But, 
when  asked  about  the  value  of  the  bound,  the  client  will  say  it  should  be  as  small 
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as  possible.  In  other  words,  the  constraint  is  really  another  objective. 

Another  way  multiple  objectives  show  themselves  is  in  documenting  the  prob¬ 
lem  solution  for  the  client.  Presented  with  a  solution  to  an  optimization  problem, 
the  client  (or  his/her  boss)  will  want  to  know  how  much  better/worse  the  objective 
would  be  if  a  certain  constraint  were  to  be  relaxed/tightened. 

The  reader  will  see  in  both  cases  the  standard  objective  synthesis  approach 
of  minimizing  a  weighted  sum  of  the  individual  objectives  is  not  helpful.  In  both 
cases,  the  decision  maker  wants  to  trade  off  one  objective  against  the  others.  What 
we  need  is  to  give  the  client  a  notion  of  the  Pareto  surface.  To  see  a  simple  case  of 
the  deficiencies  in  the  the  weighted  sum  approach,  see  ll29ll.  We  do  not  recommend 
this  approach;  however,  If28ll  is  an  interesting  way  to  find  a  single  important  Pareto 
point. 

Since  the  filter  approach  is  based  on  multi-objective  ideas,  we  hope  that  our 
filter  approach  can  be  adapted  to  provide  helpful  tools.  However,  this  is  not  as 
straight  forward  as  one  might  hope. 

4.3  Ability  to  handle  more  decision  variables 

It  would  be  useful  to  extend  MADS  to  handle  hundreds  of  decision  variables,  on 
problems  where  parallelism  li37ll  alone  would  not  be  sufficient  to  solve  the  prob¬ 
lem.  As  with  all  direct  search  methods,  we  expect  to  see  the  number  of  function 
values  required  to  solve  an  arbitrary  n  dimensional  problem  increase  much  faster 
than  n.  Our  goal  is  to  find  alternative  direct  search  methods  that  slow  the  growth. 


5  Conclusions 

Direct  search  methods  are  here  to  stay  as  a  valuable  subarea  of  optimization.  They 
are  interesting  theoretically,  and  they  are  indispensable  in  practice.  These  special 
issues  will  document  many  of  the  advances  that  have  been  made  in  the  area,  but 
much  remains  to  be  done. 

We  have  sketched  some  useful  properties  and  limitations  of  MADS  algo¬ 
rithms.  A  researcher  willing  to  build  a  strong  theoretical  background  in  nons¬ 
mooth  analysis  and  learn  to  work  with  users  will  find  this  a  satisfying  and  fruitful 
area  in  which  to  work.  The  experience  of  helping  a  user  formulate  and  solve  a 
problem  thought  to  be  intractable  is  the  ultimate  validation  for  an  applied  mathe¬ 
matician.  Come  on  in,  the  water  is  fine. 

A  reader  interested  in  obtaining  software  should  visit  10112411271  38,  l40ll42l. 
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